1 Introduction

Here we will analyze the case CLL9 alone. Previously, we compared RT vs CLL using all samples, without defining clusters. However, we know from our study that there exists intratumor heterogeneity in both CLL and RT samples. That is, in RT samples we can still find a CLL cluster, and viceversa. Thus, we will focus exclusively on the case CLL9, for which we have two samples, before and after RT. We will:

  • Check whether we can find an early seeding of RT in the CLL sample.
  • Perform a differential expression analysis (DEA) between RT and CLL clusters.

2 Pre-processing

2.1 Load packages

library(Seurat)
library(tidyverse)
library(ggrepel)
library(DT)
library(openxlsx)

2.2 Define parameters

# Paths
path_to_obj <- here::here("results/R_objects/Penter2021/3.seurat_obj_clustered_Penter2021.rds")
path_to_save <- here::here("results/R_objects/Penter2021/4.seurat_obj_CLL9_Penter2021.rds")
path_to_save_xlsx <- here::here("results/tables/DEA/dea_RT_vs_CLL_CLL9_Penter2021.xlsx")


# Colors
color_palette <- c("black", "gray", "red", "yellow", "violet", "green4",
                   "blue", "mediumorchid2", "coral2", "blueviolet",
                   "indianred4", "deepskyblue1", "dimgray", "deeppink1",
                   "green", "lightgray", "hotpink1")
cols_rt <- c("#dcdddc", "#cd8899")


# Source functions
source(here::here("bin/utils.R"))


# Thresholds
alpha <- 0.05
min_avg_log2FC <- 0.25
selected_avg_log2FC <- 0.9
selected_pct_cells <- 10
selected_significance_alpha <- alpha

2.3 Load data

seurat <- readRDS(path_to_obj)

3 Subset and cluster CLL9

# Subset
seurat$donor_id <- str_remove(seurat$sample_id, "_.*$")
seurat <- subset(seurat, donor_id == "CLL9")
seurat
## An object of class Seurat 
## 33593 features across 28477 samples within 1 assay 
## Active assay: RNA (33593 features, 2000 variable features)
##  3 dimensional reductions calculated: pca, umap, harmony
# Reprocess
seurat <- process_seurat(seurat, dims = 1:30)
DimPlot(
  seurat,
  group.by = "disease_state",
  pt.size = 1.2,
  split.by = "disease_state",
  cols = cols_rt
) +
  theme(plot.title = element_blank())

DimPlot(
  seurat,
  group.by = "sample_description",
  split.by = "sample_description",
  cols = color_palette
) +
  theme(plot.title = element_blank())

DimPlot(seurat, group.by = "tissue", split.by = "tissue") +
  theme(plot.title = element_blank())

# Cluster
seurat <- FindNeighbors(seurat, dims = 1:30, reduction = "pca")
seurat <- FindClusters(seurat, resolution = 0.6)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 28477
## Number of edges: 749678
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8184
## Number of communities: 11
## Elapsed time: 5 seconds
DimPlot(seurat, cols = color_palette)

# Markers
markers_all <- FindAllMarkers(seurat, only.pos = TRUE, logfc.threshold = 0.5)
DT::datatable(markers_all, options = list(scrollX = TRUE))
# Cell cycle scoring
seurat <- CellCycleScoring(
  seurat,
  s.features = cc.genes.updated.2019$s.genes,
  g2m.features = cc.genes.updated.2019$g2m.genes
)
FeaturePlot(seurat, c("S.Score", "G2M.Score"))

  • Clusters 0, 3, and 4 are homogeneous and they correspond to RT cluster. We see how they express a plethora of genes involved in mitochondrial metabolism: ATP5MF, NDUFS6, UQCR10, SOD1, etc.
  • Clusters 1, 2, and 5 are bona-fide CLL clusters.
  • Cluster 7 corresponds to the MIR155HGhi cluster that we have found in our own samples.
  • Cluster 8 likely corersponds to poor-quality cells, because its main markers are genes encoded in the mitochondrial genome. However, it also shows a high expression of important genes such as CD79B and ZAP70, so we will keep it.
  • Cluster 6 displays a signature of poor-quality cells (MALAT1), and unfiltered erythrocytes/erythroblasts (HBB), so we will remove it:
table(seurat$seurat_clusters, seurat$disease_state)
##     
##      CLL active disease CLL Richter's transformation
##   0                   3                         6876
##   1                5163                           76
##   2                4476                          251
##   3                   0                         3696
##   4                   5                         2575
##   5                1720                          120
##   6                  34                         1264
##   7                1149                           95
##   8                   9                          559
##   9                  50                          180
##   10                  1                          175
seurat <- subset(seurat, seurat_clusters != "6")
seurat <- process_seurat(seurat, dims = 1:30)
DimPlot(
  seurat,
  group.by = "disease_state",
  pt.size = 1.2,
  split.by = "disease_state",
  cols = cols_rt
) +
  theme(plot.title = element_blank())

DimPlot(seurat, group.by = "tissue", split.by = "tissue") +
  theme(plot.title = element_blank())

DimPlot(
  seurat,
  group.by = "sample_description",
  split.by = "sample_description",
  cols = color_palette
) +
  theme(plot.title = element_blank())

We still see a cluster of remaining erythroblasts/erythrocytes, let us find it and remove it:

FeaturePlot(seurat, c("HBM", "HBA2", "HBD"))

seurat <- FindNeighbors(seurat, dims = 1:30, reduction = "pca")
seurat <- FindClusters(seurat, resolution = 1.25)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 27179
## Number of edges: 720581
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.7406
## Number of communities: 15
## Elapsed time: 5 seconds
DimPlot(seurat, cols = color_palette)

seurat <- subset(seurat, seurat_clusters != "14")

Let us perform a soft clustering so we can easily fetch both RT and CLL populations:

seurat <- FindNeighbors(seurat, dims = 1:30, reduction = "pca")
seurat <- FindClusters(seurat, resolution = 0.05)
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
## 
## Number of nodes: 26998
## Number of edges: 717819
## 
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.9736
## Number of communities: 2
## Elapsed time: 5 seconds
DimPlot(seurat)

seurat$annotation <- case_when(
  seurat$seurat_clusters == "0" ~ "RT",
  seurat$seurat_clusters == "1" ~ "CLL"
)

4 Differential expression analysis

Idents(seurat) <- "annotation"
saveRDS(seurat, here::here("7-revision/Penter2021/tmp/seurat_tmp.rds"))
dea <- FindMarkers(
  seurat,
  ident.1 = "RT",
  ident.2 = "CLL",
  only.pos = FALSE,
  logfc.threshold = 0
)
dea <- dea %>%
  arrange(desc(avg_log2FC))
dea$direction <- case_when(
  dea$p_val_adj < alpha & dea$avg_log2FC > min_avg_log2FC ~ "up",
  dea$p_val_adj < alpha & dea$avg_log2FC < 0 & abs(dea$avg_log2FC) > min_avg_log2FC ~ "down",
  TRUE ~ "not sig."
)
pct_cells <- Matrix::rowMeans(seurat[["RNA"]]@counts > 0) * 100
dea$gene <- rownames(dea)
dea$pct_cells_expressing <- pct_cells[dea$gene]


# Inspect table
DT::datatable(dea, options = list(scrollX = TRUE))
# MA plot
(ma_gg <- ma_plot(
  dea,
  selected_avg_log2FC = selected_avg_log2FC,
  selected_pct_cells = selected_pct_cells,
  selected_significance_alpha = selected_significance_alpha
))

5 Early seeding

DimPlot(
  seurat,
  group.by = "annotation",
  split.by = "sample_description",
  cols = cols_rt,
  pt.size = 1
)

6 Save

saveRDS(seurat, path_to_save)
openxlsx::write.xlsx(dea, path_to_save_xlsx)

7 Session Information

sessionInfo()
## R version 4.1.1 (2021-08-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux Server release 6.7 (Santiago)
## 
## Matrix products: default
## BLAS:   /apps/R/4.1.1/lib64/R/lib/libRblas.so
## LAPACK: /apps/R/4.1.1/lib64/R/lib/libRlapack.so
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] openxlsx_4.2.5     DT_0.20            ggrepel_0.9.1      forcats_0.5.1      stringr_1.4.0      dplyr_1.0.7        purrr_0.3.4        readr_2.1.1        tidyr_1.1.4        tibble_3.1.6       ggplot2_3.3.5      tidyverse_1.3.1    SeuratObject_4.0.4 Seurat_4.0.6       BiocStyle_2.22.0  
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1          backports_1.4.1       plyr_1.8.6            igraph_1.2.11         lazyeval_0.2.2        splines_4.1.1         crosstalk_1.2.0       listenv_0.8.0         scattermore_0.7       digest_0.6.29         htmltools_0.5.2       fansi_1.0.0           magrittr_2.0.1        tensor_1.5            cluster_2.1.2         ROCR_1.0-11           limma_3.50.0          tzdb_0.2.0            globals_0.14.0        modelr_0.1.8          matrixStats_0.61.0    spatstat.sparse_2.1-0 colorspace_2.0-2      rvest_1.0.2           haven_2.4.3           xfun_0.29             crayon_1.4.2          jsonlite_1.7.2        spatstat.data_2.1-2   survival_3.2-11       zoo_1.8-9             glue_1.6.0            polyclip_1.10-0       gtable_0.3.0          leiden_0.3.9          future.apply_1.8.1    abind_1.4-5           scales_1.1.1          DBI_1.1.2             miniUI_0.1.1.1        Rcpp_1.0.7            viridisLite_0.4.0     xtable_1.8-4          reticulate_1.22       spatstat.core_2.3-2   htmlwidgets_1.5.4     httr_1.4.2            RColorBrewer_1.1-2    ellipsis_0.3.2        ica_1.0-2             farver_2.1.0          pkgconfig_2.0.3       sass_0.4.0            uwot_0.1.11          
##  [55] dbplyr_2.1.1          deldir_1.0-6          here_1.0.1            utf8_1.2.2            labeling_0.4.2        tidyselect_1.1.1      rlang_0.4.12          reshape2_1.4.4        later_1.3.0           munsell_0.5.0         cellranger_1.1.0      tools_4.1.1           cli_3.1.0             generics_0.1.1        broom_0.7.11          ggridges_0.5.3        evaluate_0.14         fastmap_1.1.0         yaml_2.2.1            goftest_1.2-3         knitr_1.37            fs_1.5.2              fitdistrplus_1.1-6    zip_2.2.0             RANN_2.6.1            pbapply_1.5-0         future_1.23.0         nlme_3.1-152          mime_0.12             xml2_1.3.3            compiler_4.1.1        rstudioapi_0.13       plotly_4.10.0         png_0.1-7             spatstat.utils_2.3-0  reprex_2.0.1          bslib_0.3.1           stringi_1.7.6         highr_0.9             RSpectra_0.16-0       lattice_0.20-44       Matrix_1.3-4          vctrs_0.3.8           pillar_1.6.4          lifecycle_1.0.1       BiocManager_1.30.16   spatstat.geom_2.3-1   lmtest_0.9-39         jquerylib_0.1.4       RcppAnnoy_0.0.19      data.table_1.14.2     cowplot_1.1.1         irlba_2.3.5           httpuv_1.6.5         
## [109] patchwork_1.1.1       R6_2.5.1              bookdown_0.24         promises_1.2.0.1      KernSmooth_2.23-20    gridExtra_2.3         parallelly_1.30.0     codetools_0.2-18      MASS_7.3-54           assertthat_0.2.1      rprojroot_2.0.2       withr_2.4.3           sctransform_0.3.2     mgcv_1.8-36           parallel_4.1.1        hms_1.1.1             grid_4.1.1            rpart_4.1-15          rmarkdown_2.11        Rtsne_0.15            shiny_1.7.1           lubridate_1.8.0